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Abstract. We study three diflFerent kinds of embeddings of tree pat- 
terns: weakly-injective, ancestor-preserving, and lea- preserving. While 
each of them is often referred to as injective embedding, they form a 
proper hierarchy and their computational properties vary (from P to 
NP-complete). We present a thorough study of the complexity of the 
model checking problem, i.e., is there an embedding of a given tree pat- 
tern in a given tree, and we investigate the impact of various restrictions 
imposed on the tree pattern: bound on the degree of a node, bound on 
the height, and type of allowed labels and edges. 



1 Introduction 

An embedding is a fundamental notion with numerous applications in computer 
science, e.g., in graph pattern matching (cf. [4]). Usually, an embedding is defined 
as a structure-preserving mapping that is typically required to be injective. Tree 
patterns are a special class of graph patterns that found applications, for instance 
in XML databases where they form a functional equivalent of (acyclic) 

conjunctive queries for relational databases. Tree patterns are typically matched 
against trees and are allowed to use special descendant edges (double lines in 
Fig. [1]) that can be mapped to paths rather than to single edges as it is the case 
with the standard child edges. 

Traditionally, the semantics of tree patterns for XML is defined using non- 
injective embeddings [1115] (Fig. 1(a)), which is reminiscent of relational data. 



Since XML data has more structure, it makes sense to exploit the tree struc- 
ture when defining tree pattern embeddings. In this context, it is interesting to 
consider injective embeddings |3l8l5lllj . However, the use of descendant edges 
makes it cumbersome to define what exactly an injective embedding of a tree 
pattern should be, and consequently, different notions have been employed. 

A weakly-injective embedding requires only the mapping to be injective and 
recent developments in graph matching suggest that such embeddings are crucial 
for expressing important patterns occurring in real life databases j5j. They are 
a natural choice when we do not wish to constrain in any way the vertical rela- 
tionship of the images of two children of some node connected with descendant 
edges. However, descendant edges can be mapped to paths that interleave, which 



means that even if there is a weakly-injective embedding between a tree pattern 
and a tree, there need not be a structural similarity between the tree and the 
tree pattern (Fig. 1(b) I. This is contrary to the structure-preservation nature of 
embeddings and hence the prefix weakly. One could strengthen the restriction 
and prevent the embedding from introducing vertical relationships between the 
nodes, which gives us ancestor-preserving embeddings fJ'. In this case two de- 
scendant edges are mapped into paths that might overlap at the beginning but 
eventually branch (Fig. 1(c) ). Finally, we can go one step further and require the 
paths not to overlap at all, which translates to lea-preserving embeddings [5], 
i.e., embeddings that preserve lowest common ancestors of any pair of nodes 
(Fig. [1(d) I. 

Unfortunately, there is a lack of a systematic and thorough treatment of 
injective embeddings and there is a tendency to name each of the embeddings 
above as simply injective, which could be potentially confusing and error-prone. 
This paper fills this gap and shows that injective embeddings form a proper 
hierarchy and that their computational properties vary significantly (from P to 
NP-complete) . This further strengthens our belief that the different injective 
embeddings should not be confused. More precisely, we study the complexity of 
the model checking problem, i.e., given a tree pattern p and a tree t is there an 
embedding (of a given type) of p in t, and we investigate the impact of various 
restrictions imposed on the tree pattern: bound on the degree of a node, bound 
on the height, and type of allowed labels and edges. 

Our results show that while lea-preserving embeddings are in P, both weakly- 
injective and ancestor-preserving embeddings are NP-complete. Bounding the 
height of the pattern practically does not change the picture but bounding the 
degree of a node in the pattern renders ancestor-preserving embeddings tractable 
while weakly-injective embeddings remain NP-complete. Our results show that 
the high complexity springs from the use of descendant edges: if we disallow 
them, the hierarchy collapses and all injective embeddings fall into P. On the 
other hand, the use of node label is not essential, the complexity remains un- 
changed even if we consider tree patterns using the wildcard symbol only, essen- 
tially patterns that query only structural properties of the tree. 

Injective embeddings of tree patterns are closely related to a number of well- 
established and studied notions, including tree inclusion 112118] . minor contain- 
ment [16117) . subgraph homeomorphism |2ll4j . and graph pattern matching ^5j. 
Not surprisingly, some of our results are subsumed by or can be easily obtained 
from existing results, and conversely, there are some that are subsumed by ours 
(see Sec. [5] for a complete discussion of related work). The principal aim of this 
paper is, however, to catalog the different kinds of injective embeddings of tree 
patterns and identify what aspects of tree patterns lead to intractability. To 
that end, all our reductions and algorithms are new and the reductions clearly 
illustrate the source of complexity of injective tree patterns. 

This paper is organized as follows. In Sec. [5] we define basic notions and in 
Sec.[3]we define formally the three types of injective embeddings of tree patterns. 
In Sec. m we study the model checking problem of the injective embeddings. 



Discussion of related work is in Sec.[5]and in Sec.|n]we summarize our results and 
outline further directions of study. Some proofs have been moved to appendix. 

2 Preliminaries 

We assume a fixed and finite set of node labels S and use a wildcard symbol * not 
present in E. A tree pattern [1111) is a tuple p = {Np, rootp, labp, childp, desCp), 
where Np is a finite set of nodes, rootp e Np is the root node, labp : Np ^ S'u{*} 
is a labeling function, childp Np x Np is a set of child edges, and desCp c 
Np y-Npisa set of (proper) descendant edges. We assume that childpndescp = 0, 
that the relation childp u desCp is acyclic and require every non-root node to 
have exactly one predecessor in this relation. A tree is a tree pattern that has 
no descendant edges and uses no wildcard symbols *. 

An example of a tree pattern can be found in Fig. [T] (descendant edges are 
drawn with double lines). Sometimes, we use unranked terms to represent trees 
and the standard XPath syntax to represent tree patterns. XPath allows to 
navigate the tree with a syntax similar to directory paths used in the UNIX 
file system. For instance, in Fig. [T]po can be written as f /a[./ /b/c]//b. In the 
sequel, we use p,po,pi, . . . to range over tree patterns and t,to,ti, . . . to range 
over trees. 

Given a binary relation i?, we denote by i?"*" the transitive closure of R, and 
by R* the transitive and reflexive closure of R. Now, fix a pattern p and take 
two of its nodes n,n' e Np. We say that n' is a \-child of n if {n,n') e childp., 
n' is a \-child of n if {n,n') e desCp, and n' is simply a child of n in p if 
{n,n') 6 childp u desCp. Also, n' is a descendant of n, and n an ancestor of 
n', if (n,n') 6 {childp u desCp)* . Note that descendantship and ancestorship are 
reflexive: a node is its own ancestor and its own descendant. The depth of a node 
n in p is the length of the path from the root node rootp to n, and here, a path 
is a sequence of edges, and in particular, the depth of the root node is 0. The 
lowest common ancestor of n and n' in p, denoted by lcap(n, n'), is the deepest 
node that is an ancestor of n and n'. The size of a tree pattern p, denoted is 
the number of its nodes. The degree of a node n, denoted degp{n), is the number 
of its children. The height of a tree pattern p, denoted height (p), is the depth of 
its deepest node. 

The standard semantics of tree patterns is defined using non-injective em- 
beddings which map the nodes of a tree pattern to the nodes of a tree in a 
manner that respects the wildcard and the semantics of the edges. Formally, an 
embedding of a tree pattern p in a tree t is a function h : Np — > Nt such that: 

1. h{rootp) = roott, 

2. for every {n,n') e childp, {h{n),h{n')) e childt, 

3. for every {n,n') e descp, {h{n),h{n')) e (childt)'^, 

4. for every n 6 Np, labt{h[n)) = labp{n) unless labpin) = *. 

We write t <std P if there exists a (standard) embedding of p in t. Note that 
the semantics of a descendant edge of the tree pattern is in fact that of a proper 
descendant: a descendant edge is mapped to a nonempty path in the tree. 



3 Injective embeddings 

We identify three subclasses of injective embeddings that restrict the standard 
embedding by adding one additional condition each. First, we have the weakly- 
injective embedding of p in i (t <inj p): 

5'. h is an injective function, i.e., h{ni) ^ h{n2) for any two different nodes ni 
and n2 of p. 

Next, we have the ancestor-preserving embedding of p in t (t <anc p)- 

5". h{ni) is an ancestor of h{n2) in t if and only if ni is an ancestor of n2 in p, 
for any two nodes ni and n2 of p. More formally, for any ni,n2 e Np 

{h{ni) , h{n2)) e child*^ (f^i,f^2) 6 {childp u descp)* . 

Finally, we have the lea-preserving embedding of p in t (t <ica p): 

5"'. h maps the lowest common ancestor of nodes ni and n2 to the lowest common 
ancestor of h{ni) and h{n2), i.e., for any pair of nodes ni and n2 of p we 
have \ca.tih{ni),h{n2)) = /i(lcap(ni, ri2)). 

In Fig. [T] we illustrate various embeddings of a tree pattern po. 




(a) non-injective to <std Po 




(c) ancestor-preserving t2 <anc po 

Fig. 1. Embedding! 




(b) weakly-injective ti <inj Po 




(d) lea-preserving <ica po 



of a tree pattern po- 



We point out that injective embeddings form a hierarchy, and in particular, 
lea-preserving and ancestor-preserving embeddings are weakly-injective. 



Proposition 3.1. For any tree t and tree pattern p, 1) t <ica p ^ t <anc P, 2) 

t <anc P => t <inj p, and 3) t <inj p^t <std P- 

It is also easy to see that the hierarchy is proper. For that, take Fig. [T] and note 
that to <std Po but to ^inj ti <inj Po but ti ^ 

anc POi and finahy, t2 ^anc Po 

but ^2 =^ica Po- We point out, however, that the hierarchy of injective embeddings 
cohapses if we disahow descendant edges in tree patterns. 

Proposition 3.2. For any tree t and any tree pattern p that does not use de- 
scendant edges, t <inj p iff t <anc P iff i < lea P- 

Furthermore, if we consider path patterns, i.e., tree patterns whose nodes have 
at most one child, there is no difference between any of the injective embeddings 
and the standard embedding. 

Proposition 3.3. For any tree t and any path pattern p, t <std P iff t ^inj P iff 

t <anc P ifft <lca P- 

4 Complexity of injective embeddings 

For a type of embedding 6 e {inj, anc, lea} we define the corresponding (uncon- 
strained) decision problem: 

Me = {{t,p) I t ^gp}. 

Additionally, we investigate several constrained variants of this problem. First, 
we restrict the degree of nodes in the tree pattern by a constant fc ^ 0, 

XD«:fe ^ \t^eP, Vn 6 Np. deg^in) ^ k}. 

Next, we define the restriction of the height of the tree pattern by a constant 
A: > 0, 

Mf'^'' = {{t,p) \t^8p, height{p) ^ k}. 

We also investigate the importance of labels in tree patterns as opposed to those 
that are label-oblivious and query only the structure of the tree, i.e., tree patterns 
that use * only. 

M*g = {it,p) \t ^9 P, Vn 6 Np. labp{n) = *}. 
It is also interesting to see if disallowing * may change the picture. 

M°g = {it,p) \t^oP, Vn 6 Np. labp{n) ^ *}. 

Finally, we restrict the use of child and descendant edges in the tree pattern. 

mI = {{t,p) \t ^0 p, descp = 0} and mI = {{t,p) \t P, childp = 0}. 

We make several general observations. First, we point out that the conditions 
on the various injective embeddings can be easily verified and every embedding 
is a mapping whose size is bounded by the size of the tree pattern. Therefore, 



Proposition 4.1. Me, M^'^'', M^^'', Ml, M°g, m[, and Ml are in NP for 
any 9 6 {inj, anc, lea} and k ^ 0. 

By Prop. [3T3l for path patterns we eraploy the existing polynomial algorithm [6]. 

Proposition 4.2. M^"^^ is in P for any 9 e {inj, anc, lea}. 

Finally, by Prop. [3T2] and Thm. l4T5l whieh shows the traetability of lea- preserving 
embeddings, we get the following. 

Proposition 4.3. M^g is in P for any 9 e {inj, anc, lea}. 

4.1 Weakly-injective embeddings 
Theorem 4.4. A^inj is NP-complete. 

Proof We reduce SAT to Miny We take a CNF formula cp = ci a ■ ■ ■ a Ck 
over the variables xi, . . . , a;„ and for every variable Xi we construct two (linear) 
trees X.^ = 2:^(771 (7r2(. .. 7rfc_i(7rfc) .. .))) and Xi = Xi(7fi(7f2(. . . 7ffc_i(7ffe) . . .))), 
where ttj = cj if the clause Cj uses the literal Xi and ttj = _L otherwise, and 
analogously, itj = cj if the clause Cj uses the literal — and nj = _L otherwise. 
The constructed tree is 

tip = ^(^1, Xi,X2, X2, . . . , Xn, Xn) 

and the constructed tree pattern is 

Pip = . . . [.//y„][.//ci][.//c2] . . . [.//c], 

where Yi = Xi/*/*/ . . . /* with exactly k repetitions of *. Figure [2] illustrates the 




Fig. 2. Reduction to A^inj for ip = {xi v ^2:3) a {xi v — '^2 v ^3) a (— 'Si v -^X2)- 



reduction for ip = (xi v — 'X3) a (xi v — '2:2 v 2:3) a (— 'Xi v — 'X2)- We claim that 
{t^,p^) e Mini ¥5 6 SAT. 



For the if part, we take a valuation V satisfying ip and construct a weakly- 
injective embedding h as follows. The fragment [./ /Yi] is mapped to Xi if V{xi) = 
true and to Xi if V{xi) = false. For each clause Cj we pick one literal satisfied 
by V and w.l.o.g. assume it is Xi, i.e., Cj uses Xi and V{xi) = true. Then, the 
embedding h maps the fragment [.//cj] to the node Cj in the tree fragment Xi. 
Clearly, the constructed embedding is an injective function. 

For the only if part, we take a weakly-injective embedding h and construct 
a satisfying valuation V as follows. If the fragment [.//li] is mapped to Xi, then 
V{xi) = false and if [.//i^i] is mapped to Xi, then V{xi) = true. To show that 
If is satisfied by V we take any clause Cj and check where h maps the fragment 
[.//cj]. W.l.o.g. assume that it is Xi and since h is weakly-injective, Yi is mapped 
to Xi, and consequently, V{xi) = true. Hence, V satisfies Cj. □ 

We observe that in the reduction above the use of the child edges in the tree 
pattern is not essential and they can be replaced by descendant edges. 

Corollary 4.5. A^j'^j is NP-complete. 

Furthermore, the proof of Thm. lTH can be easily adapted to the bounded degree 
setting. Indeed, one can easily show that for any tree t = r{ti, . . . ,tk) and any 
tree pattern p = r[.//pi] . . .[.//pm], t <inj p if and only if t' p', where 
f = Ai{...Amiti,...,tk)...), p' = Ai[.//pi]/ .../A^[.//p^], and Ai,...,Am 
are new symbols not used in p. This observation, when applied to the tree pattern 
in the reduction above, allows to reduce the degree of the root node and to obtain 
a tree pattern of degree bounded by 2. Note, however, that this technique does 
not allow to reduce the degree of nodes in arbitrary tree patterns. 

Corollary 4.6. Mf^f' is NP-complete for any fc > 2. 

A reduction similar to the one presented above can be used to construct patterns 
whose height is exactly 2. 

Theorem 4.7. M^f' is NP-complete for any fc > 2. 

If we consider patterns of depth 1, where the children of the root node are leaves, 
then a diligent counting technique suffices to solve the problem. 

Proposition 4.8. M^-^^ is in P. 

Proof. Fix a tree pattern p whose depth is 1 and a tree t. For a e E lj {*} we 
denote by the number of a-labeled |-children of rootp, by pi the number of 
a-labeled || -children of rootp, and by t^' and tj* the numbers of a-labeled nodes 
of t at depths = i and > i resp. 

We attempt to construct a weakly-injective embedding of p to t using the 
following strategy: (1) we map the nodes of pL to nodes of t^^, (2) we map the 
nodes of pi to nodes of t^"^ and if pi > tf^, we map the remaining pi — tj^ 
nodes to the nodes of t"^^, (3) we map the nodes of p[ to the remaining nodes 
of t at depth 1, and (4) we map the nodes of pi to the remaining nodes of t. 



Clearly, this procedure succeeds and a weakly-injective embedding can be 
constructed if and only if the following inequalities are satisfied: 



pI ^ for a e S, (1) 

pl^t^^-A foraei:, (2) 

pi ^ T^a^sita' - - min(pi - tf^,0)), (3) 

pi^[i:aeEiti'-Pa-pl)]-pi (4) 



Naturally, these inequalities can be verified in polynomial time. □ 

Finally, we observe that while in the reductions above we use different labels 
to represent elements of a finite enumerable set, the same can be accomplished 
with patterns using * labels only, where natural numbers are encoded with simple 
gadgets. The gadgets use the fact that a node of a tree pattern that has k |- 
children can be mapped by a weakly-injective embedding only to a node having at 
least k nodes. On the other hand, we can easily modify reduction from Thm. 14.41 
yield tree patterns without * nodes. 

Theorem 4.9. and A^°nj are NP-complete. 

4.2 Ancestor-preserving embeddings 
Theorem 4.10. A^anc is NP-complete. 

Proof. To prove NP-hardness we reduce SAT to A^anc- We take a formula (p = 
Cl A C2 A . . . A Cfe over variables xi , . . . , a;„ and for every variable Xi we construct 
two trees: Xi = Xi{cj-^, . . . ,Cj^) such that Cjj,...,Cj^ are exactly the clauses 
satisfied by using the literal Xi , and A^ = Xi (cjj , . . . , Cj^ ) such that Cj ^ , . . . , Cj^ 
are exactly the clauses using the literal -■x^. The constructed tree is 

tip = r(Ai, Ai, . . . , A„, A„). 

And the tree pattern (written in XPath syntax) is 

Pip = r[xi\... [xn][.//ci] . . . [.//Ck\. 

An example of the reduction for ip = {xi v ^xs) a (xi v -^X2 v X3) a (-■xi v ^X2) 
is presented in Fig. [31 The main claim is that {tp^p^p) e Mane iff 1^ e SAT. We 




/\ I /\ I I 

Cl C2 C3 C2 C3 C2 Cl 



Fig. 3. Reduction to Afanc for ip = {xi v — 'a::3) A {xi v -^X2 v 2:3) A (— 'Xi v — '2:2). 



prove it analogously to the main claim in the proof of Theorem 14.41 The use 
of ancestor-preserving embeddings ensures that the fragments [xi] and [-//cj] 
are not mapped to the same subtree of ii^, and this reduction does not work for 
weakly- injective embeddings. □ 

We point out that in the proof above, the constructed pattern has height 1. 
Corollary 4.11. A^anc''' NP-complete for every k ^ 1. 

Also, the use of child edges is not essential and they can be replaced by descen- 
dant edges and the reduction does not use * labels. 

Corollary 4.12. A^inc and M^^^^ are NP-complete. 

Bounding the degree of a node in the tree pattern renders, however, checking 
the existence of an ancestor-preserving embedding tractable. 

Theorem 4.13. For any k ^ 0, M^^'' is in P. 

Proof. We fix a tree t and a tree pattern p. For a node m e Np we define 
<P{m) = {n e Nt \ t|„ <anc p\m}, where t|„ is a subtree of t rooted at n (and 
similarly, we define p\m). Naturally, t <anc P iff roott e (p{rootp). 

We fix a node m e Np with children mi,...,mfc, suppose that we have 
computed 'P{mi) for every j G {1, . . . , fc}, and take a node n e Nt. We claim that 
n belongs to (p(rn) if and only if the following two conditions are satisfied: 1) 
labt{n) = lahp{m) unless labp(m) = *, 2) there is (ni, . . . , rik) e ^(mi) x . . . x 
^{•nik) such that a) Ui is not an ancestor of rij for all i j, b) (n, rii) e childt if 
{m,mi) 6 childp, and c) {n,ni) 6 child^ if {m^nii) 6 descp. 

Since k is bounded by a constant, the product <?(mi) x . . . x <?(mj.) is of 
size polynomial in the size of t, and therefore, the whole procedure works in 
polynomial time too. □ 

Finally, gadgets similar to those in Thm. H^ allow us dispose of labels altogether. 
Theorem 4.14. A^anc NP-complete. 

4.3 LCA-preserving embeddings 
Theorem 4.15. A^ica is in P. 

Proof. We fix a tree t and a tree pattern p. For a node m e Np we define 
<^(m) = {n e Nt \ t\n <ica p|m}, where t\n is a subtree of t rooted at n (and 
similarly, we define p\m ). Naturally, t <ica P if and only if roott e (p{rootp). We 
present a bottom-up procedure for computing <l>. 

We fix a node m e Np with children mi,...,mfe, suppose that we have 
computed <P(rni) for every i e {1, . . . , fc}, take a node n e Nt, and let ni, . . . , 
be its children. We claim that n belongs to <P{m) if and only if the following 



two conditions are satisfied; 1) laht{n) = labp{m) unless lahp{m) = * and 2) the 
bipartite graph G = {X u Y, E) with X = {mi, . . . , mfc}, Y = {ni, . . . , ni}, and 

E = {{mi,nj) I (m^mi) 6 childp a e ^^rrii) v 

(m^mi) e descp a EIn' e (p^rrii). {nj,n') e childp. }, 

has a matching of size fc. In the construction of E' we use the expression {nj ,n') e 
childp because a ||-child mi of m needs to be connected with proper descendants 
of n and these are descendants of rij's. We finish by pointing out that a maximum 
matching of G can be constructed in polynomial time |10j . □ 

5 Related work 

Model checking for tree patterns has been studied in the literature in a variety 
of variants depending on the requirements on the corresponding embeddings. 
They may, or may not, have to be injective, preserve various properties like the 
order among siblings, ancestor or child relationships, label equalities, etc. In this 
paper, we consider unordered, injective embeddings that additionally may be 
ancestor- or lea-preserving. 

Kilpelainen and Mannila |12j studied the unordered tree inclusion problem 
defined as follows. Given labeled trees P and T, can P be obtained from T 
by deleting nodes? Here, deleting a node u entails removing all edges incident 
to u and, if u has a parent v, replacing the edge from v to u by edges from 
V to the children of u. The unordered tree inclusion problem is equivalent to 
the model checking for ancestor-preserving embeddings where the tree pattern 
contains descendants edges only. [T^] shows NP-completeness for tree patterns 
of height 1. Moreover, [T3] shows that the problem remains NP-complete when 
all labels in both trees are * or when degrees of all vertices except root are at 
most 3. These two results subsume our Thm. 14.101 and 14.141 |12|14j show also 
the tractability of the problem when the degrees of all nodes in the tree pattern 
are bounded. Thm. 14.131 generalizes this to allow also for child edges in the tree 
patterns. 

The tree inclusion problem is a special case of the minor containment problem 
for graphs [16|17j : given two graphs G and iJ, decide whether G contains H as 
a minor, or equivalently, whether H can be obtained from a subgraph of G by 
edge contractions, where contracting an edge means replacing the edge and two 
incident vertices by a single new vertex. For trees, edge contraction is equivalent 
to node deletion. Since minor containment is known to be NP-complete, even 
for trees, this gives another proof of NP-completeness for ancestor-preserving 
embeddings. 

Valiente [18j introduced the constrained unordered tree inclusion problem 
where the question is, given labeled trees P and T, whether P can be obtained 
from T by deleting nodes of degrees one or two. The polynomial time algo- 
rithm given there is based on the earlier results on subtree homeomorphisms [2] 
where unlabeled trees are considered. The constrained unordered tree inclusion 



is equivalent to model checking of lea-preserving embeddings where all edges in 
the tree pattern are descendants. Our Thru. 14.151 slightly generalizes the above 
result allowing also for child edges in the pattern. 

David [3 studied the complexity of ancestor-preserving embeddings of tree 
patterns with data comparison (equality and inequality) and showed their NP- 
completeness. Although we show that ancestor-preserving embeddings are NP- 
complete even without data comparisons, the reductions used in ;3 construct 
tree patterns of a bounded degree, which shows that adding data comparisons 
indeed increases the computational complexity of the model checking problem. 

Recently, Fan et al. [5] studied 1-1 p-homomorphisms which extend injective 
graph homomorphisms by relaxing the edge preservation condition. Namely, the 
edges have to be mapped to nonempty paths. However, neither the internal 
vertices nor edges within the paths have to be disjoint. In case of trees, 1-1 p- 
homomorphisms correspond to the weakly-injective embeddings that we consider 
in this paper. By reduction from exact cover by 3-sets problem they have shown 
NP-completeness of model checking in the case where the first graph is a tree 
and the second is a DAG. We improve this result in Thm. 14.41 and 14.91 

When embeddings have to preserve the order among siblings, model checking 
becomes much easier. The ordered tree inclusion problem was initially introduced 
by Knuth [T31 exercise 2.3.2-22] who gave a sufficient condition for testing inclu- 
sion. The polynomial time algorithms from 12 is based on dynamic program- 
ming and at each level may compute the inclusion greedily from left-to-right 
thanks to the order preservation requirement. The tree inclusion is also related 
to the ordered tree pattern matching [!J , where embeddings have to preserve the 
order and child-relationship, but they do not have necessarily to preserve root. 

6 Conclusions and future work 



We have considered three different notions of injective embeddings of tree pat- 
terns and for each of them we have studied the problem of model checking. 
Table [1] summarizes the complexity results. All our results extend to embed- 
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Table 1. Summary of complexity results 



dings between pairs of tree patterns, used for instance in static query analy- 



sis [H] . Although some of our results are subsumed by or can be easily obtained 
from existing results, our reductions and algorithms are simple and clean. In 
particular, we show intractability with direct reductions from SAT. 

In the future, we would like to find out whether there is an algorithm for 
checking lea-preserving embeddings that does not rely on constructing perfect 
matchings in bipartite graphs. The exact bound on complexity of non-injective 
embeddings of tree patters is a difficult open problem 7 and it would be inter- 
esting if establishing exact bounds on tractable cases of injective embeddings is 
any easier. 
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Appendix: Omitted proofs 



Proposition [37ll For any tree t and tree pattern p, 1) t ^\ca. p ^ t <anc P; ^) 

t s$anc P => t <inj p, and 3) t <i„j p^t <std P- 

Proof. Assume that t <ica P and let ft, be a lea-preserving embedding. Con- 
sider any ni,n2 such that {h{ni), h{n2)) 6 child^. Since h is lea-preserving and 
lca{h{ni) , h{n2)) = h{ni), lca{ni,n2) = ni and therefore ni is an ancestor of 
712. So ft is ancestor-preserving. 

For the proof of 2, consider p, t such that t <anc P and let ft be an ancestor- 
preserving embedding. We show that ft is injective. Assume that there are rii, ri2 
such that h{ni) = h{n2). Since ft is ancestor-preserving, (ft(ni), ft(r7,2)) e child^, 
and (ft(n2), ft(ni)) e child^, {ni,n2) e {childp vj descp)* and (?t.2,'t-i) e {childp u 
descp)*, so rii = n2. i <inj P- 

Finally, Implication 3 follows from the fact that any injective embedding is 
an embedding. n 

Proposition 13.21 For any tree t and any tree pattern p that does not use 
descendant edges, t p iff t <anc P iff i ^ica P- 

Proof. Assume that p does not use descendant edges. By Proposition 13. 1[ it is 
enough to prove that t <inj p implies t <ica p. 

Let t <inj p and ft be an embedding from p to t. It is easy to see that ft is 
an isomorphisms on a substructure of t, and therefore ft is lea-preserving and 
t =^ica Together with Proposition 13. II it implies all the equivalences. □ 

Proposition [3T3l For any tree t and any path pattern p, t <std iff t ^inj P iff 

t s$anc Pifft <lca P- 

Proof. Assume that p is a path pattern and t <std P and let ft be an embedding 
from p to t. Consider any nodes n,m oi p such that there is a path from n to m. 
Clearly, lca{n, m) = n. By Properties 2 and 3 of the definition of embeddings, 
there is also a path from ft(n) to ft(m), hence ka(ft(n), ft(m)) = h{n). Therefore 
h is lea-preserving and t <ica P- By Proposition 13.11 we obtain the required 
equivalence. □ 

Theorem 14.71 Mf^^ is NP-complete for any k ^2. 

Proof. We show how to build, for a given instance ip of SAT problem, a pattern 
p,p and a tree t^ such that <inj Pip if and only if ip is satisfiable. Let (p = 
ci AC2 A - • ■ ACfc be an instance of SAT over variables xi, . . . ,Xn. We set S = {a, ci, 

• • ■ : , Si , . . . , Sn } • 

For each i, we define the tree Xi as follows. Its root is a node and it is 
connected to + 1 nodes, namely xf,p},pf, . . . , . Node xf has fc-l-2 successors, 
namely Si, nj, . . . , n^^^ . All other nodes have no successors. 

The tree consists of a root r and its n disjoint successors — A"i, . . . , Xn 
(see Fig. HI). 

Now we define the labeling of t^p. Let , . . . , Ci, be the clauses with the posi- 
tive occurrence of Xi, and Cj^ , . . . , Cj^, be the clauses with the negative occurrence 



of Xi. For all s ^ we label Ps with q^. Similarly, for all s ^ /' we label by 
Cj^ . We label Si by Si and all other nodes by a. 

The pattern is as presented at Fig. |31 Clearly, its depth is bounded by 2. 




Assume that <inj and let h be the corresponding embedding. Let Y be 
the set of all the successors of rootp labeled by * in p^p and h{Y) be the image of 
Y. A quick check shows that for each i there is exactly one node from {a;^, a;"} 
in h{Y). We define the valuation for ip such that Xi is positive if x" e h{Y) and 
negative otherwise. 

Consider any clause Cs and let m be the node in p^ labeled by c^. Assume 
that h{m) = for some i, j. It means that Xi occurs positively in Cg and that 
x^ does not belong to h(Y) — otherwise, if = h{m!) for some m\ then all 
successors of would be results of h applied to successors of m', contradicting 
the facts that h{m) = pi and h is injective. Therefore, Cg is satisfied. 

The proof that if ip is satisfiable then t^p <inj p^ should now be straightfor- 
ward. □ 

Theorem 14.91 M*^j and M°^^j are NP-complete. 

Proof. For the Alfnj case, we simply adjust the proof of Theorem l4.7[ taking the 
advantage of the fact that all the nodes with labels different than * are in leaves. 

For each s G N, we define tree that consists of two nodes with fc + 3 
successors and a path connecting them of length s (see Fig. [5]). Note that in the 
original does not contain any node of degree ^ fc + 3. 

We replace all nodes (in and p^) labeled by Cg by and all nodes labeled 
by Us by T^+s- Then, in t^, we replace all labels by a. It is readily checked that 
for any i j there is no embedding from T/^' to Tj^, and since tip contains no 
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Fig. 5. The tree . 

nodes with degree at least fc + 3, there is an embedding from the modified pattern 
to the modified tree if and only if ip is satisfiable. 

For the M"?^^ case, we simply replace all * in the pattern defined above by a, 
the only label present in the tree. a 

Theorem 14.141 A^anc ^•s '^^^ NP-complete. 

Proof. We modify the proof of Theorem 14. 101 First, we adjust the tree and the 
pattern by adding one node below each Xi, label it by Xi and label old Xi by a. 
We also replace r by a and all a in the pattern by * (see Fig. |6l). 




/I\ /\ I /l\ /\ /\ III 

Xl Ci C2 Xi C3 X2 X2 C2 C3 X3 C2 X3 C\ X\ X2 X3 

Fig. 6. Adjusted reduction to A^anc for ip = {x\ v —■2:3) a (xi v -^X2 v x^) a (— 'Xi v -^X2)- 



The obtained tree and pattern have the following property: the only nodes 
that are not labeled by * or a are leaves. By virtually the same way as in Theorem 
14.91 we can replace them by trees Tg. □ 



